Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages
نویسندگان
چکیده
The focus of this paper is estimating articulatory movements of the tongue and lips from acoustic speech data. While there are several potential applications of such a method in speech therapy and pronunciation training, performance of such acoustic-to-articulatory inversion systems is not very high due to limited availability of simultaneous acoustic and articulatory data, substantial speaker variability, and variable methods of data collection. This paper therefore evaluates the impact of speaker, language and accent variability on the performance of an acoustic-to-articulatory speech inversion system. The articulatory dataset used in this study consists of 21 Dutch speakers reading Dutch and English words and sentences, and 22 UK English speakers reading English words and sentences. We trained several acoustic-to-articulatory speech inversion systems both based on deep and shallow neural network architectures in order to estimate electromagnetic articulography (EMA) sensor positions, as well as vocal tract variables (TVs). Our results show that with appropriate feature and target normalization, a speaker-independent speech inversion system trained on data from one language is able to estimate sensor positions (or TVs) for the same language correlating at about r = 0.53 with the actual sensor positions (or TVs). Cross-language results show a reduced performance of r = 0.47.
منابع مشابه
Acoustic to articulatory inversion
The context of this work is speech analysis. The subject deals with acoustic-to-articulatory inversion, i.e. the recovery of the temporal evolution of the vocal tract shape from the signal. This topic is important because it is likely to give rise to applications in the domains of speech coding as well as second language learning. Acoustic-to-articulatory inversion relies on an analysis by synt...
متن کاملProcessing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures.
Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three dif...
متن کاملAutomatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.
An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech ...
متن کاملComparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents
This article presents an experimental comparison of two types of techniques, articulatory and acoustic, for transforming nonnative speech to sound more native-like. Articulatory techniques use articulators from a native speaker to drive an articulatory synthesizer of the non-native speaker. These methods have a good theoretical justification, but articulatory measurements (e.g., via electromagn...
متن کاملAcoustic-to-articulatory inversion using a speaker-normalized HMM-based speech production model
Acoustic-to-articulatory inverse mapping is a difficult problem because of its non-linear and oneto-many characteristics. We have previously developed a speech inversion method using a hidden Markov model (HMM)-based speech production model which takes into account the phonemespecific dynamic constraints of articulatory parameters. We found that the constraint significantly decreases the estima...
متن کامل